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Uniform Convergence Rates over Maximal Domains in 
Structural Nonparametric Cointegrating Regression 

James A. Duffy* 


Abstract 

This paper presents uniform convergence rates for kernel regression estimators, in the setting 
of a structural nonlinear cointegrating regression model. We generalise the existing literature 
in three ways. First, the domain to which these rates apply is much wider than has been 
previously considered, and can be chosen so as to contain as large a fraction of the sample as 
desired in the limit. Second, our results allow the regression disturbance to be serially cor¬ 
related, and cross-correlated with the regressor; previous work on this problem (of obtaining 
uniform rates) having been confined entirely to the setting of an exogenous regressor. Third, 
we permit the bandwidth to be data-dependent, requiring only that it satisfy certain weak 
asymptotic shrinkage conditions. Our assumptions on the regressor process are consistent 
with a very broad range of departures from the standard unit root autoregressive model, al¬ 
lowing the regressor to be fractionally integrated, and to have an infinite variance (and even 
infinite lower-order moments). 


1 Introduction 

whereas data on a stationary regressor will lie, with high probability, within a fixed bounded 
interval of sufficient width, the randomly wandering nature of an integrated process prevents 
it from being contained within any such interval, no matter how wide. Consequently, the global 
nonparametric estimation of a regression function taking the latter as an argument is considerably 
more difficult, as it requires one to approximate the regression function on an ever-expanding 
domain - widening probabilistically at rate in the unit root case. The inherent randomness 
of the limiting occupation density associated with the (standardised) regressor process poses 
a further challenge, complicating the identification of domains on which observations may be 
guaranteed to accumulate, in a manner that seems not to have any parallel in the case of a 
stationary regressor. 
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email: j aunes. duf f ySeconomics. ox. ac. uk. This paper substantially revises and extends some results given in an 
earlier paper of the author’s (Duffy, 2013). The author thanks Xiaohong Chen, Bent Nielsen and Peter Phillips for 
helpful comments on this paper, and the earlier work. The manuscript was prepared with LyX 2.1.3 and JabRef 2.7b. 
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This paper considers kernel nonparametric estimators of mg in the nonlinear cointegrating 
model 

jt = mo(xt) + lit (1-1) 

where Xj = 2s=i the partial sum of a linear process {v^}, and {u^} is an unobserved dis¬ 
turbance process. Regarding the pointwise consistency and asymptotic normality of these estim¬ 
ators, we refer in particular to Karlsen, Myklebust, and Tjpstheim (2007) and Wang and Phillips 
(2009a,b). Our assumptions on the mechanism generating {xj} are very general, not only per¬ 
mitting fractional integration of order de(—1,|) - where d = 1 corresponds to the familiar unit 
root autoregressive model - but also including the case where the variance (and even lower-order 
moments) of {vj} do not exist. 

We obtain rates of uniform convergence for our estimators on a sample-dependent sequence 
of domains, which correspond as nearly as possible to the entire empirical support of 
in the sense that they may be chosen so as to contain as large a fraction of the data as desired 
in the limit. These domains are thus maximally wide; in contrast, previous work on uniform 
convergence rates in this setting has been limited to the consideration of smaller, deterministically 
expanding intervals, which necessarily contain an asymptotically negligible fraction of the data 
(see Wang and Wang, 2013; Chan and Wang, 2014; and Gao, Kanaya, Li, and Tjpstheim, 2015). 
Being able to estimate mg uniformly on such wide domains should be especially useful in the 
context of certain semiparametric estimation problems, such as arise when mg(Xf) in (1.1) is 
replaced by the more general formulation mg(Xj/3g), where Xf is a vector nonstationary process, 
and both /3g and mg are to be jointly estimated. Clearly, only observations lying in those domains 
on which mg may be (uniformly) consistently estimated would be of any use in estimating /3g; 
our results suggest, reassuringly, that ‘almost all’ the observed sample should be available for this 
purpose. 

We further generalise previous work by permitting the regressor to be endogenous, and the 
bandwidth sequence to be data-dependent in a very general way. Endogeneity arises naturally 
in the setting of cointegrating models such as (1.1), which are so dynamically under-specified as 
to be plausible only as a model of the long-run equilibrium relationship between {jJ and {xj. 
The exogeneity assumption typically - though not universally, see Wang and Phillips (2009b, 
2015) - imposed in these models is thus unlikely to be satisfied in applications. Moreover, in any 
application, the bandwidth used to compute a kernel regression estimator will not be determined, 
a priori, as some function of the sample size, but will instead be chosen with at least some 
reference to the sample at hand. To better accommodate this aspect of actual empirical work, 
we allow the bandwidth to be functionally dependent on the sample {(yt>'^t)}t=i! requiring only 
that it satisfy a weak as}miptotic shrinkage condition. 

The proofs of these convergence rates are facilitated by a number of new technical results. 
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The first concerns the weak convergence of the standardised signal process 


£n(a) 





n t=l 



£(a) 


( 1 . 2 ) 


in where: £ denotes the occupation density associated to the finite-dimensional limit of 

^n^^[nr\l i^n.} a norming sequence; K e L^(M) is a mean-zero kernel density func¬ 
tion; and /t„ = Op(l) is a smoothing (bandwidth) sequence. This type of result is proved in 
Duffy (2015) and is reproduced as Proposition 2.1 below. (1.2) may be loosely regarded as the 
nonstationary process counterpart of the uniform convergence of the signal to the corresponding 
invariant density that obtains when {xj is stationary. In the present setting, £„ arises as the 
denominator of the Nadaraya-Watson estimator, and so, when combined with a suitable charac¬ 
terisation of the (random) support of £, (1.2) allows us to identify a sequence of domains on 
which the signal must accumulate at a certain probabilistic rate. 

The second class of relevant technical results supplies uniform order estimates for 

1 r f ^t~ 1 ^ Xt - 

\ K J ‘ K J 

where f,g^ i^(K), f g = 0, and {uj is weakly dependent. These are referred to as the covari¬ 
ance and zero energy processes, and are respectively relevant for a determination of the uniform 
order of the variance and bias of a kernel regression estimator. The estimates obtained here (The¬ 
orem 2.1 below) appear to be new to the literature. While Chan and Wang (2014) provide an 
estimate for the covariance process when {Xf} is exogenous, our estimate holds even when {u^} is 
correlated with {xj}, and is within a log^^^ n factor of theirs. In consequence, endogeneity of the 
regressor seems to penalise the rate of convergence of a kernel regression estimator by merely a 
factor of log^^^ n. 


Notation For a complete index of the notation used in this paper, see Section A.2 of the Sup¬ 
plement. For deterministic sequences {a„} and {b„}, we write a„ b„ iflim^^^ajb^ = 1, and 
a^ ^ if lim^^oo e (—oo, oo)\{0}; for random sequences, a„ <p denotes a„ = Op(b„). 

X denotes weak convergence in the sense of van der Vaart and Wellner (1996), and 
Xn '~^fdd ^ ths convergence of finite-dimensional distributions. For a metric space (Q,d), £oo(Q) 
denotes the space of uniformly bounded functions on Q, equipped with the topology of uniform 
convergence; while fucc(Q) denotes the space of functions that are uniformly bounded on com¬ 
pact subsets of Q, and is equipped with the topology of uniform convergence on compacta. For 
p > 1, X a random variable, and / : M —> M, ||X||p := (E|X|^)^/^ and ||/||p := (/kI/P)^^^- BI 
denotes the space of bounded and Lebesgue integrable functions on M. [-J and [•] respectively 
denote the floor and ceiling functions. C denotes a generic constant that may take different values 
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even at different places in the same proof; a < b denotes a < Cb. 

2 Discussion of results 

2.1 Model and assumptions 

We are concerned with the estimation of itiq in the model, 

yt = mo(xt) + Ut (2.1) 


where satisfies 

Assumption 1. 

(i) ^ ^ bivariate Ltd. sequence. Eq lies in the domain of attraction of a strictly stable 

distribution with index a e (0,2], and has characteristic function t/i(A) Ee'^'^o satisfying 
tp e LP° for some pg ^ 1- — 0^ Eko^ol < < oo/or some > 2. 

(ii) {Xf} is generated according to 

t 00 

Xt:=^Vs (2.2) 

s=l fc=0 


and either 

(a) a€ (1,2], < oo and or 

(pl^ ~ for some strictly positive and slowly varying at infinity, with 

(b) H > 1 / a; or 

(c) H < 1/a and cpk = 0- 

In both cases (b) and (c), H e (|, 1). 

(iii) {Uf}, the regression disturbance, is the linear process 

00 

Ut-=Y^Ok'nt-k (2-3) 

k=0 

'E’k=o\^k\k^^^ < °°- 

Remark 2.1. The preceding conditions may be compared with those imposed by Wang and Phil¬ 
lips (2009b, 2015); but quite unlike those authors, we do not require Ee^ < oo. Although our 
assumptions are consistent with substantial departures from the standard unit root model - which 
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here coincides with (ii)(a) with a = 2 - {xj is in all cases a partial sum process, and this feature 
of the generating mechanism identifies (2.1) as a nonlinear cointegrating regression, in the ter¬ 
minology of Park and Phillips (2001). To allow the alternative forms of (ii) to be more concisely 
referenced, we shall regard part (a) as corresponding to the case where H = 1/a. 

The arguments used in this paper could be adapted to derive our main results when H e (0, |]. 
However, for H falling within this range, certain simplifications - resulting, in particular, from 
parts (i) and (ii) of Lemma 3.4 below - are unavailable, and the statement of our results would 
take a more complicated form. To keep this paper to a reasonable length, we have therefore re¬ 
stricted to H e (|, 1). (This restriction is also necessary for certain related results in the literature: 
in particular, see Theorems 4 and 5 in Jeganathan, 2008.) 

Remark 2.2. Part (hi) permits the regression disturbance to be serially dependent, and cross- 
correlated with the regressor; (2.1) is thus a structural model. Wang and Phillips (2015) allow 
{U(} to be generated in a slightly more general manner, according to 

00 

“t = -■ Uit + U2t. 

k=0 

By considering separately the cases in which Uf = u^ and u^ = U 2 t, it is easily seen that The¬ 
orems 2.1 and 2.2 below hold, without any modification, when Uj = -I- U 2 t, provided that 

< oo and g is such that Eg^(eo) < oo. 

Instead of (hi), we might have required to be a martingale difference sequence 

(m.d.s.), where tF,. '.= crdXj, u^-i}s<t) (see e.g. Park and Phillips, 2001, Ass. 2.1); we say that {u^} 
is an exogenous m.d.s. in this case. While this alternative assumption would be very convenient, 
it seems rather restrictive in the setting of such an ‘under-specified’ model as (2.1), in which any 
short-run dynamics affecting the relationship between and Xj must be absorbed into Uf. 

Remark 2.3. The requirement that < oo is stronger than is necessary to ensure the 

pointwise consistency and asymptotic normality of kernel regression estimators in this setting: see 
for example Wang and Phillips (2015), who merely assume that Were (hi) to 

be relaxed in this direction, then the arguments used to prove the main results of this paper could 
still be applied, but the rates of convergence obtained would be complicated by the presence of 
an additional term, the magnitude of which would depend, in a somewhat complicated manner, 
on the rate at which —> 0 as k —> oo. 

We shall consider both local level (Nadaraya-Watson) and local linear estimators of mg, to be 
denoted by m and respectively. To facilitate nonparametric estimation, we require 

Assumption 2. mg is twice continuously differentiable. 

Remark 2.4. The preceding is stronger than is necessary for a determination of the convergence 
rates of these estimators; our arguments would also permit a derivation of these rates when 
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Assumption 2 is relaxed to the Hlder continuity of ttiq or of its derivatives. We have refrained 
from doing so only in order to permit our convergence rates to be concisely stated. 

The construction of both estimators involves the use of a smoothing kernel K, and a band¬ 
width sequence In order to state our assumptions on K, let BIL denote the set of bounded, 
integrable and Lipschitz continuous functions on M, and recall that {e„} is the norming sequence 
that appears in (1.2) above (see also (2.8) below). 

Assumption 3. 

(i) K e BIL is compactly supported, with f K = 1 and f xK(x)dx = 0. 

(ii) /t„ e [h^, h] with probability approaching 1 (w.p.a. 1), where h< oo, and h~^ < 

for some rg > 0. 

Remark 2.5. The Lipschitz continuity and compact support of IC ease some of our arguments, but 
are certainly not necessary for the most fundamental results which underpin our derivations. For 
example. Theorem 3.1 in Duffy (2015) requires only that K have one-sided Lipschitz approxim- 
ants, a rather weak condition that is consistent with the presence of simple discontinuities. 

Remark 2.6. An important feature of the present work, relative to the preceding literature on 
nonlinear cointegration, is that we permit the bandwidth sequence to be random, and thus data- 
dependent, requiring only that it take values lying in the (growing) interval w.p.a.l. 

This is of considerable utility in applications, where /t„ will typically be chosen with at least 
some reference to the sample at hand, making the assumption that {/t„} is a ‘given’ deterministic 
sequence quite unrealistic. In the i.i.d. regressor case, results of this kind are given in Einmahl 
and Mason (2005). Note that restricting to deterministic bandwidth sequences would not help us 
to obtain better rates of convergence than are given in Theorem 2.2 below. 

Assumptions 1-3 are maintained throughout the paper, even when no explicit reference is 
made to them. We shall treat the parameters (including H and a) describing the data generating 
mechanism as ‘fixed’, ignoring the dependence of any constants on these. 

2.2 Asymptotic behaviour of the regressor process 

Before proceeding to an account of our main results, we describe the limiting behaviour of the 
standardised regressor process X^(r) := that is entailed by our assumptions, and which 

is fundamental to our results. (The required norming sequence {d„} is given in (2.8) below.) 

Part (i) of Assumption 1 implies that there exists a slowly varying sequence {g/^} such that 



(2.4) 
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where Z„ denotes an a-stable Lvy motion on M, with Z„(0) = 0. That is, the increments of Z„ are 
stationary, and for any < r 2 the characteristic function of Z„(r 2 ) — Z„(r^) has the logarithm 


-(r2-ri)c|Ar 


1 — i^ tan 


r TtaA 


where (3 e [—1,1] and c > 0; following Jeganathan (2004, p. 1773), we impose the further 
restriction that (3 = 0 when a = 1. We shall also require that be chosen such that c = l here, 
which provides a convenient normalisation for the scale of Z„. (Thus when a = ‘l,Z^ corresponds 
to a Brownian motion with variance 2.) LetX denote the linear fractional stable motion (LFSM) 

X(r):= r(r-s)«-i/“dZ„(s) (2.5) 

Jo 

+ [ [(r-s)«-i/“-(-s)"-i/“]dZ„(s) 

J —00 

with the convention that X = when H = 1/a. (See Samorodnitsky and Taqqu, 1994, for 
a detailed discussion of the LFSM; note that when a = 2, X is a fractional Brownian motion.) 
Associated to X is the occupation density (local time) process £ — {>d(a)}Qg]j, a process which, 
almost surely, has continuous paths and satisfies 

[ f{x)£{x)dx— [ /(X(r))dr, V/bounded, measurable. (2.6) 

Jr Jo 

(See Theorem 0 in Jeganathan, 2004.) 

Now let [cfc} denote a sequence with Cq = 1 and 


4> 


\(H = 1/a 
otherwise. 


(2.7) 


\~\k 

By Karamata’s theorem (Bingham, Goldie, and Teugels, 1987, Thm. 1.5.11), 
k —> oo. Set 


4 := ^ (2-8) 

and note that the sequences {c;,.}, {d^} and {ej^} are regularly varying with indices H — 1/a, H and 
1 — H respectively. The following is a special case of Proposition 2.1 and Theorem 3.1 in Duffy 
(2015). 

Proposition 2.1. For every f e BIL, 



n“-n \ '‘■n J 


aa) / f 


(2.9) 
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on £ucc W. jointly with X„(r) ~^fdd X[r). 


2.3 Order estimates 


Our rates of convergence tvill be obtained tvith the aid of the follotving order estimates, for the 
covariance and zero energy processes respectively, which appear to be new in the literature. 

Theorem 2.1. Suppose / e BIL. Then 


-TTfSUp 


E/ 

t=l 


Xf — d^a 


<p(_l + n^^‘i° ''°)logn; 


and if additionally J® / — 0 and /|/(x)x| dx < oo. 


( 2 . 10 ) 


1 




sup 

aeK 



logn. 


( 2 . 11 ) 


See Section 5 for the proof. In both cases the assumed smoothness of f permits the supremum 
over R to be effectively reduced to a maximum over a sequence of finite sets, the zero 

energy process, the requisite bound over is provided (essentially) by Proposition 4.2 in Duffy 
(2015). It turns out that a counterpart of this result is available for the covariance process, but 
its application requires that a truncation be first applied to {uj. In order to state the result, let 
denote an appropriately truncated (and centred) version of rjj (see (5.1) below), such that 
= 0 and ||q||„ := llqg-^Hoo < oo- For Y>k=o 


^nf 

t=i 


( 2 . 12 ) 


and for ^ c BI, set 


5nm - llrjIlJIJ^Iloo + IMn + + lli^llz) (2.13) 

where \\^\\ := supj:^^\\f\\. 

Proposition 2.2. Suppose c BI with < n'^. Then 

max|5„/| < 5„(Jf„)logn. 

The proof of this result appears in Section 4. 

Remark 2.7. If {Uf} is an exogenous m.d.s., then {/(X()Uf} itself forms a m.d.s., and an application 
of Freedman’s (1975, Thm. 1.6) inequality permits the logn factor on the right side of (2.10) 
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to be reduced to log^^^n; see Wang and Chan (2014, Thm. 2.1). In the present case, however, 
{/(Xf)Uf} is not a m.d.s., precluding a direct application of such a result. Instead, we shall rely on 
the combination of a suitable subgaussian tail inequality for martingales (Bercu and Touati, 2008, 
Thm. 2.1) - which implies Lemma 3.6 below - and a martingale decomposition of X!t=i 


2.4 Rates of uniform convergence 

The essential features of the problem become apparent when we consider the Nadaraya-Watson 
estimator, which admits the decomposition 


rh(x) - mo(x) ^ 


- x)[mo(xt) - mo(x)] ELi 


. 'i'lnM ^ 4>2n(y) 


(2.14) 


where lCf,(y) h~^K(h~^y'). We shall now examine each of 'i> 2 n ^^d 'I' 3 „, in turn: compared 
with stationary regressor case, the treatment of the denominator poses some unique challenges 
here, and so we turn to it first. 


Denominator Set £„(a) := ^ Zit=i ~ ^^d define 

a; := {x e M I r„(d-^x) > e}. (2.15) 


Noting the different standardisations of and 'T 3 „, we see that 

sup 'T-^x) - r inf 'T 3 n(^)) ' < 

xeA® 


(2.16) 


Thus A^ describes a subset of M - which depends on the trajectory of {Xf - on which the order 
of 'I' 3 „^(a:) may be uniformly controlled. Importantly, e > 0 may be chosen (sufficiently small) 
such that A^ contains as large a fraction of the sample as desired, in the limit as n —» oo, in the 
sense that 


limsuppj — ^ l{xj ^ A^} > 5 I < 5 

i t=i j 


(2.17) 


for any given 5 > 0: see the arguments used to verify (5.4) in Duffy (2015). 

In general. A® will be a union of disjoint (closed) intervals, even for large n. This is necessarily 
the case when H <1/a, since in this caseX has discontinuous sample paths (Samorodnitsky and 
Taqqu, 1994, Example 10.2.5), and so the support of £, which is contained in the range of X, will 
typically contain gaps. However, in the special case where Ee^ < oo (implying a = 2) and H = ^, 
it is possible to replace A® by a sequence of (connected) intervals. In order to state a result to 
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this effect, letf?^ where [a, b]g := [(1 — e)a, (1 — s)b] and let denote the 

order statistics of the sample {Xf}"^^. 

Proposition 2.3. Suppose H = ^ and Ee^ < oo, and let 5 > 0 be given. Then for every s > 0, 
('2.16^ holds with R® in place ofA^^; and for e > 0 sujficiently small, (2.17) holds with f?® in place of 

K- 

Remark 2.8. This result is a essentially a consequence Ray’s (1963) theorem, which implies that 
the local time £ of a diffusion J is strictly positive on the interior of the range of J. In the setting 
of the present paper, this seems to be applicable only in the case where X is a Brownian motion. 
However, it also applies in the important case where 

X(r)= [ dB(s) 

Jo 

for k: e M and B a Brownian motion. Such a process arises as the weak limit of under 

the hypotheses of Proposition 2.3, if {xj is generated according to Xf = X(_i + p^Vf, where 
Pn = 1 + ^: see Wang and Phillips (2009b). (To extend our results to this case would require 
very little modification to our arguments; indeed, we explicitly considered such a data generating 
mechanism in an earlier version of this paper.) Whether such a characterisation of the support of 
£ is available in other cases where X has continuous sample paths, most notably when a = 2 and 
H 1/2- i.e. whenX is a fractional Brownian motion - seems to be an open question. 

Remark 2.9. In view of (2.17), the volumes of both A® and R® must expand, probabilistically, 
at rate as n —> oo. However, even under the hypotheses of Proposition 2.3, A® could not be 
replaced by a sequence of deterministic intervals [—a„,a„] whose endpoints diverge at rate d„. 
Indeed, suppose that = C^n^^^ for some Cq > 0. Then bylf„ X and the reflection principle 
(Revuz and Yor, 1999, Prop. IIL3.7), 

for every Cq > 0, no matter how small; here denotes the standard normal c.d.f With nonzero 
probability, never visits [|Con^/^, Con^/^], and so the signal is forever negligible within 

this range. This accounts for why earlier work on this problem (e.g. Wang and Wang, 2013; Chan 
and Wang, 2014; and Gao, Kanaya, Li, and Tjpstheim, 2015), which considered deterministic 
intervals of this form, has been restricted to domains whose volume grows at a rate strictly slower 
than d^, which necessarily contain a vanishingly small fraction of the observed {xj as n —> oo. 
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Numerator To provide a measure of the ‘regularity’ of mg over a given domain, we associate to 
ttiq the mappings m 2 : P(R) —> M_,_ U { 00 }, defined by 


mi(A) := sup|mQ(x)| 
xeA 


m2(A) ■= sup|mo(x)|. (2.18) 

xeA 


Let := {x e M I d(x,A^) < where is chosen such that the support of JC is contained 

in [—Cjf,Cjf]. Then a Taylor series expansion of ttiq around Xf, for each t e yields the 

estimate 


supIvTi^l </i„mi(A^)sup 
xeAi xeM 


E 

t=i 




x) 


+ 


^n^2(^^)sup^|fC^^^(Xt - X)|, (2.19) 


■ t=l 


where denotes x •-» x^/(x). Applying Theorem 2.1 and Proposition 2.1 to the first and 
second terms on the right, respectively, then gives 


1 9 

— suplvpij <p 

xeA® 




Similarly we have by Theorem 2.1 that 


-sup|'T2nl^p(l + r^^^" 

9:eA® 


logn 


( 2 . 20 ) 


( 2 . 21 ) 


In view of (2.16), the rate at which m(x) converges uniformly to mo(x) on A® is given by the 
sum of the right sides of (2.20) and (2.21). In giving a formal statement of our results below, we 
assume that qg and rg are such that n^/‘io~’'o < so that the right side of (2.21) takes a simplified 
form. 


Theorem 2.2. Suppose rg > qg Then for every s > 0, 


e- sup|m(x)-mo(x)| < 


xeAi 


logn 


1/2 


+ m2(A^J 


+ 


logn 


( 2 . 22 ) 


and 

„_ ~ logn 

e ■ sup |mi(x) - mg(x)| < /i^m 2 (A„) + (2.23) 

xeA^„ 

Remark 2.10. These uniform convergence rates agree almost exactly with their pointwise coun¬ 
terparts (see e.g. Wang and Phillips, 2011), except for 


(i) the presence of the log n factors; and 

(ii) the dependence of the bias terms on the suprema of m^ and m^ over A®. 
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Thus, as the support of {x^} - to which is an approximation - expands over the line, these 
‘uniform’ bias terms may shrink less rapidly than their pointwise counterparts, depending on the 
tail behaviour of the derivatives of mg. 

Remark 2.11. After the manuscript of this paper had been completed, we obtained a copy of 
an unpublished manuscript by Liu, Chan, and Wang (2014), who determine the uniform rate 
of convergence of ifi^ when {xj is exogenous ({uj is a heteroskedastic m.d.s.) and {/t„} is a 
deterministic sequence. (Rather than a sequence of random domains such as {A® }, they consider 
only a deterministic sequence of intervals, with the consequences discussed in Remark 2.9 above.) 
Due to the assumed exogeneity, the log n factor appearing in the second terms on the right sides 
of (2.22) and (2.23) can be improved to log^^^ n, as per Remark 2.7 above. The presence of 
endogeneity would thus seem to penalise the rate of convergence of these estimators by at worst 
a factor of log^^^ n. 

Underpinning these authors’ derivations is an analogue of our Proposition 2.1 (their The¬ 
orem 2.1), which is worked out under quite different assumptions on the regressor process than 
are imposed here. In this regard, we may note particularly their requirement that there exist a 
sequence of processes {X*} withX* X, and a 5 > 0 such that 

sup |X„(r) -X*(r)| ^ Oa.s.(f^”^)> 

re[0,l] 

a condition which excludes a large portion of the processes considered in this paper, for which 
merely X„ -^f^d X is available (this is particularly true when H < 1/a and a e (0,2), since 
in this case the sample paths of X are unbounded: see Samorodnitsky and Taqqu, 1994, Ex¬ 
ample 10.2.5). On the other hand, our results do not subsume those of Liu, Chan, and Wang 
(2014), since those authors do not require {vj to be a linear process; there is thus only a partial 
overlap between the class of processes considered in this paper, and in theirs. 

Remark 2.12. Provided that e > 0 is fixed, and and m 2 are bounded on M - which is perfectly 
consistent with linear or sublinear growth in the tails of mg - Theorem 2.2 implies that requiring 
convergence on a domain almost as large as the range of {xj} does not penalise the convergence 
rate of either estimator, relative to the rate that could be proved on an interval of fixed with. This 
might seem to contrast markedly with the situation when {xj is stationary, where necessarily 
slower rates of convergence hold on domains that expand with the sample size, as the estimator 
is pushed into regions where {xj has a progressively smaller density (see e.g. Hansen, 2008, 
Thm. 8; Kristensen, 2009, Thm. 1; and Li, Lu, and Linton, 2012, Thm. 2.1). However, this 
phenomenon would re-emerge here if we were to let e = —» 0 as n —> 00 : indeed, it is 
immediate from (2.22) and (2.23) that the rates of uniform convergence of our estimators, over 
the domains {A^"}, would be slowed by a factor of e~^ in this case. 

Remark 2.13. In a recent paper, Chan and Wang (2014) argue that while both th and fhi enjoy 
similar pointwise bias properties, the latter enjoys markedly better performance than the former. 
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SO far as the uniform behaviour of their respective bias terms is concerned. This conclusion is 
partly borne out by Theorem 2.2; but the improved order estimate obtained here for the linear 
bias term (the first element on the right side of (2.22)) indicates that this judgement may need 
to somewhat qualified. In particular, if both m^(A^) and m 2 (A®) are of a comparable magnitude, 
then log n —» oo will ensure that the second order bias (the second term on the right side 
of (2.22)) dominates the linear bias term; this is scarcely less restrictive than the condition that 
^ 00 that is required for this conclusion when only the pointwise performance of these 
estimators is in issue (see Wang and Phillips, 2011). 

See Section 6 for the proofs of Proposition 2.3 and Theorem 2.2. 


2.5 An alternative perspective on our results 

There is another way of viewing the estimation problem considered in this paper, which high¬ 
lights the connections between our results, and those which are obtained when {Xf} is stationary. 
Defining a sequence of regression functions m„(x) mo(d„x), the model (2.1) can be rewritten 
as 

Yt = mo(xt) + Uj = m„(d“^Xt) + Ut = m„(x„t) -I- u^, 


where x„f := ^Xj. Taking = d^ we see that 







n t=l 



£(x) 


(2.24) 


in £oo(I^) by Proposition 2.1. In light of this, we might regard the {x„f}’s as being drawn from 
a spatial distribution with marginal density £(x) - just as stationary regressors would be drawn 
from a distribution with marginal density p(x). (Restricting ourselves to yields a domain 

on which the density £ can be bounded away from 0, which is equally desirable in the stationary 
regressor case.) 

Now suppose that almost nothing is known about {x„(}, beyond the fact that a convergence 
result of the same kind as (2.24) holds (together with some knowledge of the support of £). Is 
this sufficient to determine the rate at which the local linear estimator of m„, computed from 
converges uniformly to m„? If {Ufj is an exogenous m.d.s., then this is indeed the 
case. Supposing 

sup|m"(x)| < a„ 

then as per Remark 2.11, we would have 

sup|mi(x) - m„(x)| <p -b ^ ^ aj —, (2.25) 

xeA r£i^hJ ti ' 
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for any set A on which £ can be bounded away from zero; the final equality follows if is chosen 
so as to balance the order of bias and variance terms. 

The rate of convergence thus depends only on how the ‘complexity’ of m„ - as measured here 
by a„ - varies with n. In the stationary setting, is (typically) a fixed function, and so a„ = ao, 
a constant. In this case, the right side of (2.25) agrees precisely with Stone’s (1982) minimax 
optimal rate for twice-continuously differentiable functions. On the other hand, when {xj is 
integrated (or near-integrated), the manner in which is constructed from a fixed rriQ gives 
a„ = ciod^, and the best obtainable rate is Op(e~^^^ log^'^^ n), as per Chan and Wang (2014). This 
leads us to believe that the minimax optimality properties of local polynomial regression, for the 
estimation of functions belonging to Hlder classes, should extend quite straightforwardly to the 
case of an integrated regressor. 

It remains to be seen how this analogy might be further extended to the case where {x„f} 
is endogenous. Heuristically, estimation of must be possible, in the integrated case, because 
the joint dependence between Uj and x„( becomes progressively weaker, as n,t —> oo. Although 
it is not immediately clear how this notion should be made precise - let alone what would be 
a suitable analogue of it in the stationary setting - the ‘location shift’ model of Phillips and Su 
( 2011 ) may be counted as an important effort in this direction. 


3 Preliminaries 


Preliminary to the proofs of our main results, this section collects some auxiliary lemmas, proofs 
of which are given in Section A.l of the Supplement. We shall rely heavily on the use of the 
inverse Fourier transform to analyse objects of the form E(/(Xf_|_;t)j similarly to Borodin and 
Ibragimov (1995), Jeganathan (2004, 2008) and Wang and Phillips (2009b, 2011). The following 
result permits the use of the ‘usual’ inversion formula, even in cases where / ^ L^, for / (A) 
f f(x)e'^^ dx. 

Lemma 3.1. Suppose Y = Yi + Y2, where is independent of (Y2,Z), and Y^ has integrable char¬ 
acteristic function ipYp Then, for every f e BI, e M, and E|g(Z)| < 00 , 

E/(yo + = TT / /(A)e-'^^'>E[e-'^^g(Z)] dA. (3.1) 

271 Jk 

Let cr({^r}r=s)> noting that nnd are independent whenever Sj < S 2 < S 3 < S 4 . 

We shall have frequent recourse to the following decomposition. 


t t 00 

k=l k=l 1=0 


00 i-l-t t—1 i 



E 

!=0 

=t—s-l-l 

* 1 / 
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for 1 < s < t, where ^ and ^ ^ are independent, and ^ is -measurable. Defining 
Oj 2)=o we may further decompose x^ ^ ^ as 

t r t 

^s,t,t = S - Xi + S ""t-ie; =: x; (3.3) 

i=s i=s i=r+l 

where x' is -measurable, and x^^^ ^ ^ is -measurable. The following property of the 
coefficients {aj is particularly important: there exist 0 < a < a < oo, and a icg e N such that 


a < 


inf inf c. 

ko+l<lc[k/2j<l<k ^ 


|azl < 


sup sup Cj^^|a;|<a. 
fco-l-l<fc [k/2i<i<k 


(3.4) 


This is an easy consequence of Karamata’s theorem (see Section F of the Supplement to Duffy 
(2015) for a proof). Throughout the remainder of the paper, fcg refers to the object of (3.4); it is 
also implicitly maintained Rq > 8po for Po in Assumption l(i). 

Having decomposed Xj into a sum of independent components, we shall proceed to control 
such objects as the right side of (3.1) with the aid of the following lemma, which provides bounds 
on integrals involving the characteristic functions of some of those components of Xj. (This lemma 
summarises and refines some of the calculations presented on pp. 15-21 of Jeganathan, 2008.) 
In order to state this result, we first note that Assumption 1 (i) is equivalent to the statement that 


logt/)(A) = -|A|“G(A) 


1 



(3.5) 


for all A in a neighbourhood of the origin, where G is slowly varying at zero (see Ibragimov and 
Linnik, 1971, Thm. 2.6.5). (Here, as throughout the remainder of this paper, a slowly varying 
(or regularly varying) function is understood to take only strictly positive values, and have the 
property that G(A) = G(|A|) for every A e M.) 


Lemma 3.2. Let p e [0,5], q € [1,2], and Zi,Z 2 ^ Then 


(i) there exists a > 0 such that for every t >0, k > kg + 1 tLnd m e {0 ,... k — 1}, 


[ (zi I A|^ A I'' dA < (3.6) 

Jr 


and if F{u) G^^'^iu) as u —> 0, 


(zi|afcP|APf(afcA) Az2)|IEi7t+fc-„ 


-iXx[ 


t+l,t+k,t+k 


l^'dA 


< 


ZiC 


” k 




(3.7) 
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(ii) for every t > 1, k>kQ + l and s e {k^ + 1,..., t}, 

Jm. Ck+s 

We note here, for future reference, that the preceding continues to hold when rj^ is replaced 
by as defined in (5.1) below. Let Ef [■] E[- | The following is an easy consequence 

of the preceding. 

Lemma 3.3. Suppose f e BI. Then 
(i) for every t > 0 and k>kQ + l, 




1 i/me {0,...,?c-1}, 

\r]t+k-m\ if mik¬ 


in') and, if in addition m e {0 ,..., fc — 1}, 


\^tf(.Xt+k)Vt+k-m\ 

Recall the definitions of {dj.} and {6;^} given in (2.8) above. The following is a straightforward 
consequence of Karamata’s theorem. 

Lemma 3.4. 

(i) 

(ii) < oo; 

(hi) < OO. 

For the reader’s convenience. Lemmas 9.4 and 7.1 from Duffy (2015) are reproduced below; 
see that paper for the proofs. For the first of these, define 

#(zi,Z2) :-E[e-^i"« -Ee-^i"o] - Ee-^^'o] . 

Lemma 3.5. Uniformly over z^,Z 2 € M, 

l^(zi,Z2)|<[|zirG(zi)Al]i/2[|22l“G(z2)Al]i/2 

where G(u) G(u) as 0. 


16 



J. A. DUFFY 


Let||-||^^ denote the Orlicz norm associated to Ti(x) :=e^ —1. (See van der Vaart and Wellner, 
1996, p. 95 for the definition of an Orlicz norm.) For a martingale M {iWt}t=o ™th associated 
filtration Q := define 

n n 

[M] (3.8) 

t=i t=i 

We say that M is initialised at zero if Mq = 0. The next result is a straightforward consequence of 
Theorem 2.1 in Bercu and Touati (2008). 

Lemma 3.6. Let denote a sequence of index sets, and {K^} a real sequence such that #©n+K^ < 

n^. Suppose that for each n e N, Ic e {1 ,... ,K^} and 6 e 0„, is a martingale, initialised at 

zero, for which 

•= max{||[M„fc(0)]||^^ V ||(M„fc(e))||^J < oo. 
ee0„ 


Then 


max 

0e0„ 


k=l 


< 



logn. 


4 Controlling the truncated covariance process 

We turn first to the proof of Proposition 2.2. For this section only, we shall denote rj\r'^ by simply 
Tjf.. Then recalling (2.12) above, we may write 

n n n n 

Snf = 2 0^Y^fiXt)r]^_^ =: ^ QmSnmf- (4-1) 

t=l m=0 t=l m=0 

For each m e {0,..., n}, by following a procedure identical to that described in Section 7 in Duffy 
(2015), the process be decomposed as 


n-l 


Snmf +^-^nmfc/ 


fc =0 


(4.2) 


where 


n 

Kmf 

t=l 


n-k 

t=l 


^mktf ■ ^tf(.^t+k')Vt+k—m ^t—lf t+k')V t+k—rm (4.3) 

and we have defined Ej [•] := E[- | 

A suitable bound for \\Afnmf Iloo provided by Lemma 3.3(ii). By construction, 
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forms a martingale difference sequence for each (m, k), and so control over each of the martingale 
‘pieces’ be obtained via control over 

n-k 

klnmkf - [Mr,mkn=Y,^iktf 
t=l 
n-k 

Kmkf ■■= {M^mkf) =Y/^t-l^lktf’ 
t=l 

in combination tvith Lemma 3.6. Defining 

W(/) — l|l?llnll/lloo + (emlhlln + Cmey^)ll/lll (4-4) 

and 

[ \\ri\\l\\f\t + (eM\l + en)\\f\\l if fc e { 0 , ...,fcol 
^lmk^fy~ \^k^(-^m\\v\\l + ^n)\\f\\i if k & {ko + 1,... ,m} (4.5) 

[ 11/ 111 if fc e {fco V m + 1,..., n - 1}, 

our first result is 

Lemma 4.1. For all m ^ {0,..., n} 


WKmfWoo^^nmifl (4-6) 


and all 0 < k < n — 1, 

WKmkfW., V ||V„^fc/||,^ < (4.7) 


The proof of (4.7), in turn, relies upon 


Lemma 4.2. For every m e {0,..., n}, fc e {0,..., n — 1} and t e {1,..., n — fc}. 


II5L,/IIco+ J] 

S = 1 


For the next result, recall the definition of 5„((#) given in (2.13) above. 


Lemma 4.3. if c BI, then 


n n 71—1 

V|0;„|sup<j„^(/) +V V|0;„|supcr„^fc(/) < 5„(^^). 

m=0 m=0k=0 


The proofs of these results are given below. We first turn to the 
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Proof of Proposition 2.2. The proof is almost identical to the proof of Proposition 4.2 in Duffy 
(2015). In view of Lemma 4.1 and Lemma 4.3, we have immediately that 


max 


Xi ^mKmf 

m =0 


m=0 




and through an application of Lemma 3.6, that 


max 


n n—1 


m=0k=0 


5„(J2'„)logn 


whence the result follows from (4.1) and (4.2). 


□ 


Proof of Lemma 4.1. For (4.6), note that by Lemma 3.3, 


if t^{l,...,ko} 
l|Eo/(xt)rjt_^||oo < i d“^||i7||„||/||i if t e {fco + l,...,m} 


^^m\\J 111 


if t e {/cq V m + 1,... n}. 


whence, by Karamata’s theorem and Lemma 3.4(i), 


l|Af.^/lloo< 


^ fco m n A 

t=fco+l t=kQ\/m+lj 

f 


l|Eo/(^£)'>lt-n 


n 


<kr 


~ <^011 'IWnWJ Iloo 


< 

/N,; II ’IWnWJ Iloo 


n ^ I ^ ^ 

t=fco+l 

+ [emWlWn + Cm^n^] 


A 


-2 
t 

t=fcoVm+l J 


(4.7) follows from Lemma 4.2 in exactly the manner described in the proof of Lemma 7.3 in Duffy 
(2015). □ 

Proof of Lemma 4.2. The proof is similar to that of Lemma 7.4 in Duffy (2015). Let mg fcg V m. 
We shall obtain the requisite bound for ^_^^f by providing a bound for 

s e {l,...,t}) that depends only on m, k and s (and not t), separately considering the cases 
where 


(i) ?ce{mo + l,...,n-t}; 
(ii) fce {fco,...,mo}; and 
(hi) ?ce {0,...,fco}- 


19 










UNIFORM RATES IN COINTEGRATING REGRESSION 


(i) Recall the decomposition given in (3.2) and (3.3) above, applied here to reduce to a 
sum of independent pieces. 


Xt+k = X 


— X 


* 

0,f+fc 

* 


0,t+fc 


_|_ / , , / 


with the convention that ^ ^ = 0 if t = 1, so that by Fourier inversion (Lemma 3.1), 


^mktf m l/('^t+/c)^t+/c—m 


1 

271 


J f{X)e '^^o.t+fce '^^Tt-i.t+fc 


(4.8) 


Thence 





(4.9) 


(Note that is real-valued, so = \^rnktf\^ = ^mktf ' ^mktf = ^mktf ' ^mktf ■') 

Now suppose s e {fc -F 1,..., t}. Taking conditional expectations on both sides of (4.9) gives 

• '^(Aiafc, A2afc)Ee“‘^^i+^2^<^+i,t-w+fc 

• Ei7t+fc_^e“'^i^t+i,t+ic.t+icEi7t+fc_^e“'^"^t+i.t+ic.t+(c dAi dA2, 


where we have defined 

-^( 21 , 22 ) :^E[e“^i^« -Ee'^i'^o] [e“’^i‘'« - Ee'^^^o j 
for 2^,22 e M, and made the further decomposition 

x^ ^x' -Fx' 

with the convention that x^ = 0 if s = t. Thence, using (3.4) and Lemma 3.5, and the 

inequalities |/(A)| < \\f\\i and |ab| < |ap -F |b|^, we obtain 

^ II |/(Ai)/(A2)||Ee-^^^i+^^K-..M-M..| (4.10) 
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■ |Ei7t+fc_;„e"'^i^t+i.f+'^.f+*^ I |Ei7t+fc_;„e“'^"'''+i.t+<^.t+<^ | dAi dAs 
^ ll/ll? J (|afcAirG(afcAi)Al)|E77,+fc_^e-'^^WM+fc|2 ( 4 . 11 ) 

[ I dA2 dAi, 


where we have appealed to symmetry (in and A 2 ) to reduce the final bound to a single term. 
By a change of variables and Lemma 3.2(ii), 


/ dA2 = / |Ee"^^^^-+i.f-i.f+H dA < —d-\ 

J J ^k+s 

while Lemma 3.2(i) gives 

J (|afcAi|“G(afcAi) A l)|Ei7t+fc_^e"'^i^t+v+M+fc|2dA < k~'^d~^c^ + 
Together, (4.11)-(4.13) yield 

^k+s 

when s e {1,..., fc}, (4.10) continues to hold, whence 

^ (/l/(A)|(|a,Ai|“G(a,Ai)Al)|Er,,+,_^e-'^"«.-M./c|^ 

<\\f\\l(ik-G^d-^c^ + e-r^’^f 
<d;\k-^d-^cl + e-r^’^)\\f\\l 


(4.12) 


(4.13) 


(4.14) 


(4.15) 


by Lemma 3.2(i); the replacement of a d^^ by d~^ in the final bound is justified because s < k. 
Since {cj.} is regularly varying and fc > /cq + I, it follows from Potter’s inequality (Bingham, Goldie, 
and Teugels, 1987, Thm. 1.5.6(iii)) that 


' + S ^ ^ Sd/ < nd/ = e„, 

5 = fc + l 


S = 1 


S = 1 


with the final bound following by Karamata’s theorem. As noted above, since the bounds (4.14) 
and (4.15) do not depend on t, they apply also to Ef<^^ Hence, in view of the preceding. 


n-k-t 




S = 1 


n-k-t 


Ec’+ E 

S=k+1 Hc+s 


<(fc-id-3c2 +e-A'^)e„ 


S = 1 


1 - 
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Turning now to IIooj note that (4.8) still holds, with the convention that = 0 

if t = 1. Thus, again by Lemma 3.2(i), 

<id-h^ + e-r^'^nf\\l 

where the final bound follows because k<n, and so < k~^e^. 


(ii) Note that {/cg + 1,..., mg} can only be nonempty if fcg < m = mg; thus k < m in this case. 
Using |a + b\^ < |ap + \ bf gives 

^ [(E,|/(x,+fc)77f+fc_,„|)' + (E,_^\f{x,+^)Vt+k-m\)^] ■ (4.16) 

Suppose s e {m — fc + 1,..., t}. Then by successive applications of Lemma 3.3(i), 

i^t-s0^t\fixt+k)Vt+k-m\y ^ d-'^d-'^WfWl, 
and similarly for the second term on the right side of (4.16). Thus 

^t-sC,J<d;\^\f\\l. (4.17) 

When s e {1,..., m — fc}, further applications of Lemma 3.3(i) give 


E,_,(E,|/(x,+j77,+fc_,„|)" < l|i7ll^E,_,(E,|/(x,+fc)|)" 

^d;^d-^r^\\l\\f\\l, 

whence 

^t-s^mkt<d;^d-^Ml\\f\\l 

Together, (4.17) and (4.18) give 


n-k- 

E 


m—k 

k,t+sf ~ ^t^m,k,t+sf 

n-k-t 


S = 1 


S = 1 

S 

=m-k+l 




~ k 

m—k 

n-k-t 

+ E c' 

ll/ll 




S = 1 

s=m—k-\-l 





emMl + e^] 

ll/ll? 



(4.18) 
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by Karamata’s theorem. 

Regarding \\oo} it follows from Lemma 3.3(i) that 

< ||t7l|2(E,|/(x,+fc)|)2 < d,-2||r,||2||/||2. 

(iii) Suppose that s e {(m — k)\/ kQ + 1,... ,t}. Using \a + b\^ < |ap + |fop, Jensen’s inequality 
and Lemma 3.3(i), we have 




2 - 


When s e {kg + 1,..., m — fc}, we have similarly 

and for se{l,...,fco}) we may use the crude bound 


E,-.?Lj<|ie,J|loo< 


2 || r ||2 


(4.19) 


Hence 


n—k—t f kg m—k n—k—t A 

E E, 5 U,.+.t= E+E + E 

\^s=l s=/co+l s=(m—/c)Vfco+l 

f 


5 = 1 


-k 


n-k-t A 

» E c'+ E -1." 

^ 5=fco+l s=(m-/c)Vfco+l j 

^ii/fc+b„iiiiiU»Jii/ii2 


Sltoll>lli;il/ll^+ 11/11^ 


< 


by Karamata’s theorem. The required bound for 11?^;^.^/Iloo is given in (4.19). 
Proof of Lemma 4.3. It is evident from (4.4) and Lemma 3.4(iii) that 


^|0;„|sup < l|t?LII^#|loo + 

m=0 




m=0 


m=0 


ll^lh 


< 


l|r?||„||^|loo+[llt?L + ey']ll^^lli 


□ 


(4.20) 


Considering the three parts of (4.5) separately, we first have 

n kg 


V V|0^|supcr„^fc(/) < ||q||J|(^||oo + 

m=0k=0 


1/2, 1/2 

mi'^m ~ 


< 


»Ei»' 

m=0 

„ll^lloo+[llt?ll„ + ey"]ll^^ll 2 


ll^lb 


(4.21) 
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since is fixed and finite. Next, 

n m n m 

Xi X l®mlsupcr„^fc(/)< Xl^ml[e;l/^l|i?lln + ey^]ll^lli X 

m=Ofc=fco+l m=0 k=ko+l 

m=0 

^ [ll^lln + ey"]ll^lli, (4.22) 

using Karamata’s theorem and Lemma 3.4(iii). Finally, 

n n n n 

X X l^mlsupcr„^fc(/) < Xl®mlCmey^ll(^lli X 

m=0k=m+l m=0 k=m+l 

^ey'll^lli (4.23) 

by parts (ii) and (hi) of Lemma 3.4. Recalling (2.13), the result now follows from (4.20)-(4.23). 

□ 

5 Proofs of the order estimates 

The proof of (2.10) in Theorem 2.1 may be broken into three parts: 

(a) a truncation argument permits {uj} to be replaced by on the left side of (2.10); 

(b) the supremum over (a, h) e M x is reduced to a maximum over a (growing) finite set; 
and 

(c) an application of Proposition 2.2 yields the requisite bound over this finite set. 

These steps are described below, following which we provide details of the modifications neces¬ 
sary for the proof of (2.11). 

5.1 Truncation 

Decomposing 

00 n 00 

k=0 k=0 k=n+l 
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we have by the Cauchy-Schwarz inequality that 


sup 

aeM 


1 r f d„a \ 


< 


sup 


aeM ^ 




2 I Af U^L 


1/2 y n 


1/2 


^(4 = Op(l) 


.t=l 


since the first term on the RHS is Op(l) by Proposition 2.1, and 


E < n Xi E 

t=l j=ii+l ;=n+l 


Next, define for — n < t < n, 

i?t“^ := rjtlflqtl < n^/''«} - Eqol{|t]ol < 


and := rjj — ri[~\ Then setting 


(5.1) 




■=Y,^kv 


(<) 

t-fc 


we see that 


sup 

aeM 


E/ 

t=i 


k=0 


Xt dpU ^ 

'u\ 




(>) 

-fc 


fc =0 


7 / 0 y < P| suplUf^^l 7 ^ 0 j- < P| max Irjtl > = o(l). 


,1/qo 


(5.2) 


where the final equality follows by Theorem 2.12.1 in Hansen (2012), since { 17 ^} is i.i.d. with 
bounded qgth moment. ( 2 . 10 ) will therefore follow once we have shown that 


sup 


aeM 




< (l + n^/'i'>-'''>)logn. 


5.2 Reduction to the maximum over a finite set 

For the remainder of the proof, we may without loss of generality take / to be bounded by unity, 
with a Lipschitz constant of unity. To simplify the exposition, we shall require that e 
always, and take h = 1; the proof in the general case (where this occurs w.p.a.l) requires no new 
ideas. As it is less cumbersome to work with the inverse bandwidth b := h~^, we define 

:= {h-^ [l,b„] 
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where := ^. For (a, b) e M x M_,_, 

1 r f 


let/(a_b)(x) := b^/V[b(x-d„a)]. Then 


1/2 _ 

t=l 


for b = b“^. 

Take C„ := [—n’", n^] x and let c Q be a lattice of mesh n~^. Pn(a, b) denotes the 
projection of (a, b) onto a nearest neighbour in (with some tie-breaking rule). We shall now 
prove that y and 5 may be chosen (sufficiently large) such that 


sup l^{(a, b)|^ sup |7^{(a,b)|-FOp(l) 

{a,b')sCn (a,b)e'if„ 

sup |7^{(a, b)| ^Op(l), 

(a,b)e[-nr,nr]^x^„ 


(5.3) 

(5.4) 


with the aid of the following. 

Lemma 5.1. For every y > 0, there exists a 5 > 0 such that 


sup 

(a,b)eC„ 



n 


Y^\fia,b)iXt) - fp,ia,b)iXt)\\u^r^ 
t=l 


Op(l). 


Lemma 5.2. Suppose f e BIL. Then |/(x)| = o(|x| ^/^) as x —» ±oo. 

Since ||Uf“^||oo ^ Lemma 5.1 may be proved by an argument identical to that used in 

the proof of Lemma 6.1 in Duffy (2015), while Lemma 5.2 is a special case of Lemma B.l in the 
Supplement to that paper. Observe that (5.3) follows immediately from Lemma 5.1. To establish 
(5.4), first note that w.p.a.l. 


inf inf lx* — d„a\ > 1 — 

t<n\a\>nr 


-r^-i 


max|xj 

t<n 


= d„n’'(l + Op(l)), 


provided that y is chosen large enough that 


Emax|Xf| < n^E|vol = o(n^d„). 

t<n 

For the proof that such a y exists, see the arguments following (C.l) in the Supplement to Duffy 
(2015). Thence by Lemma 5.2 

max sup — d„a)] < max sup |xt — d„a|“^^^ 

{a,b)e[-nynrYxgs„ t<n |„|>„r 
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whence 


sup 

(a,b)e[—n'^,n'’'yx‘M„ 


1 ” 

|7^{(a,b)|< sup 

f n2/go y/" 
[e^d^nrj 
-0(1) 




for Y > 0 sufficiently large. 

5.3 Control over the finite set 

It remains to provide an estimate for 


sup |7^{(a,b)|-— sup 

(a,b)e%'„ {a,b)s‘€„ 


Y^f(a,b)iXt)u^t 


<) 


t=l 




sup|5„g|. 


where := {/(a,b) I e and 5„g is defined as in (2.12) above. Since / e BI and 

it is clear that 


3n(%) ^ MnbT + [ll^lln + = 0[el^\l + n'/'Jo-'-o)] 


and thus 




sup|5„g|<p(l + ni/‘3«-''o)logn 


by Proposition 2.2. This completes the proof of (2.10). 


5.4 Modifications required for the proof of (2.11) 

The proof of (2.11) is almost identical to the preceding, albeit somewhat simpler. The truncation 
performed in Section 5.1 is not necessary, while the same argument as given in Section 5.2 may 
be used to reduce the problem to that of providing a suitable bound for 



sup|5*g| 


where S*g := j Define 


■■= m\oo+ei^\m\i + \\n2)- 


n-l 




- 3/2 

k 


.k=l 


k=l 




< 


l|J^lloo + ey"(ll=^lli +11^112+ lli^ll[i]) 
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where ||/||[i] := inf{c e M_,_ | |/(A)| < c|A|}, and the final bound follows from parts (i) and (ii) of 
Lemma 3.4. With the aid of Lemma 9.1(ii) in Duffy (2015), it is easily verified that 

ll^nlll V \\%\\2 V < 1 ^ bT = 


whence by Remark 4.1 in Duffy (2015), 


as required. 


sup|5*g| <p e^^^^5l{%)logn<logn 


6 Proofs of the convergence rates 


Recall that 

1 " 

where K satisfies Assumption 3. 

Proof of Proposition 2.3. Define 

re[0,l] 

sup |A(r)|. 
re[0,l] 

Since (a,H) = (2, |), A is a Brownian motion, and so by Ray’s (1963) theorem. 


inf £(a) > 0 >■ = 1. 

ae[m,9Jt]g 


( 6 . 1 ) 


For a more detailed argument as to why (6.1) follows from Ray’s theorem, see (6.4.36)-(6.4.38) 
and the surrounding discussion in Karatzas and Shreve (1991). (Note that it is necessary that e > 
0 here, since £(m) = £(911) = 0 by the continuity of £.) Under the assumption that Ee^, 
on £oo[0> 1] by Hannan (1979), whence (m„, 911„) (m, 911) by the continuous mapping theorem 

(GMT). By Theorem 3.1 in Duffy (2015), £,, '~^ £ on ^^^(M), and thus a further application of the 
GMT yields 


inf 

X6R® e 


1 

rE 


KhS^t 


t=i 


x)= inf 




inf £(a). 

ae[m,OTt]5 


( 6 . 2 ) 


Together, (6.1) and (6.2) yield (2.16). 
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To obtain (2.17), we note that 


1 .^ 

> (1 - e)x(„ 3 } < l{X„(r)>(l-£-)Wl„}dr + Op(l). 


(6.3) 


Let Igfc} denote a uniformly bounded sequence of functions that converges pointwise to x 
l{x > 0}, from above. Then 

[ l{X„(r) > (1-e)ajt„}dr < / gfc[X„(r) - (1 - e)911„] dr 
Jo Jo 

f gk[X{r)-(l-s)m]dr 


l{X(r) > (1 -e)911}dr, 


(6.4) 


as n —> oo and then ?c —> oo, by the CMT and the dominated convergence theorem. Further, 


l{X(r)>(l-e)im}dr 


l{X(r) = Sr)l}dr - / l{x = 91t}£(x)dx - 0 (6.5) 


as e ^ 0, by dominated convergence, (2.6), and the fact that £(911) = 0. It follows from (6.3)- 
(6.5) that e > 0 may be chosen such that 


( 1 n Id 

- Y! > (1 - > 6 i < -. 

” t=l J ^ 


By an analogous argument, this holds also when l{Xf > (1 — e)X('„^} is replaced by l{Xf < 
(1-£>(!)}■ □ 

The proof of Theorem 2.2 requires the following two results. For a matrix A, let ||A||j- 
SUP||x|| = lll^ll2- 

Lemma 6.1. For every g e BIL with /|g(x)x| dx < oo. 


sup 

aeM 


1 " 

—y 

p h t i 


g 


Xf — d^a 


^n(a) J g 


-oAl). 


( 6 . 6 ) 


Lemma 6.2. Suppose that Y^^a) is a (k x k) matrix-valued process, such that Tn(a) is positive 
semi-definite for every a e M and n e N, and let F be a positive definite {k x k) matrix, for which 
suPaeMl|f"r,(a) - r£„(a)||j- ^ 0^(1). Then 


sup ||Yn(a) ^llr^pl. 

{a|Z;„(a)>e} 
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Proof of Lemma 6.1. Setting /(x) g(x) — K{x) f g, the left side of (6.6) may be written as 


sup 

asM 


enK 


Zf 



< 

r\jp 


logn 


^Op(l) 


by Theorem 2.1. □ 

Proof of Lemma 6.2. By multiplying both and AC^ by A“^, we may reduce the problem to one 
in which A = I^.. We may also replace (Y„,£„) by a distributionally equivalent sequence for which 
(£, £) in see Theorem 1.10.3 in van der Vaart and Wellner (1996). Define 

•— (<^)llr 5 ^nd let VLqCiQ. denote a set, with Pflg = 1, on which £" —> £" in 

£ucc(l®)> r" —» 0 in fooM- 

It is easily verified that, for B a real symmetric matrix, and z > 0, 


A^i„(B) - z + A^i„(B -zl)>z- \\B - zI\\t 


where Ajjjj„(B) denotes the smallest eigenvalue of B. Thus, fixing an o) e O, 


0 ) 




> £ — sup r"(a) 


whence 


sup ||T"(a)-i||r = 

{a|£“(a)>£} 


-1 


inf A„ 

{a|Z:“(a)>£} 




from which the result follows. 


,-i 


□ 


Proof of Theorem 2.2. Since (2.22) follows from arguments given in the course of Section 2.4, we 
provide the proof of only (2.23) here. In the notation of Fan and Gijbels (1996, pp. 58f), mi(x) 
is given by the first element of 

f(x) := (X'WXy^X'Wy, 

which admits the decomposition 


fix) - fix) = iX'WX)~'^X'W[mo -X'f] + iX'WX)~^X'Wu, 
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where y ^ (ji,..., j„y, u ^ (u^,..., u„y. 


1 Xj - X 


KhS^i-x) 


mo(xi) 

1 ... 

IN 

t-H • • • 

W := diag 

KhS^2-x) 

mo := 

mo (^2) 

_1 x„-x_ 


JhS^n-x)_ 


_mo(x„)_ 


and /3(x) = (mo[x),mg(x)y. 

Observe X'WX is a (2 x 2) matrix with (i, j)th element 

{x'wxyjM = - x). 

t=l 

Note that by Lemma 6.1, 

- d,a) - £„(a) f 4 0 

t=l " J 

in£(^(M). Hence, for D := diag[l,/i„], 

e“i [D-~^iX'WX)D-^Xd^a) - /C£(a) 4 0 
in where /C [/ K;h+j-2] ] is positive definite. Thus by Lemma 6.2, 

sup\\[DiX'WXr^D]ix)\\T= sup \\mx'WXr^D]id,a)\\r<pe;\ (6.7) 

xeA® {a|Z;„(a)>e} 


To handle the bias term, note that by a Taylor series expansion 

No(^t) - PoM - /3i(x)(xt - x)| < m"(xt)|xt - x 
for all X where Xf e [x,Xf]. Hence for i e {1,2}, 


|{D-Vw[mo -4^]};(x)| < /t 2 m 2 (A^j 2 |Jc£+‘](x, - x)|, 

t=i 

whence by Proposition 2.1, 

1 7ti r1 

— sup|{D“^X^W[mo —X'l3]}i{x)\ < ——-—— sup 

JceA® fin xeM 


- ^)l hlm2(Ay). (6.8) 
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For the variance term, note that for i e {1,2} 



(6.9) 


by Theorem 2.1. (6.7)-(6.9) now yield the stated result. 


□ 
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A Supplementary material 

A.l Proofs of Lemmas 3.1-3.4 

Proof of Lemma 3.1. Let gfc(z) := g(z)l{|g(z)| < k}. gj. is bounded, and a straightforward exten¬ 
sion of the argument used to verify (9.1) in Duffy (2015) gives that 

E/(l^)gfc(Z) = ^//(A)E[e-'^'''gfc(Z)] dA 

for every fc e N. Now let fc —> oo; the left side converges to E/(F)g(Z) by dominated convergence. 
For the right side, using that Yi and (F 2 ,Z) are independent, we have 


I /(A)E[e-'^'^{g,(Z)-g(Z)}] dA 

<ll/llil|t/'YjliE|g(Z)|l{|g(Z)|>/c} 

^0 


<(^J |/(A)t/)y^(-A)|dAjE|gfc(Z)-g(Z)| 


using the fact that |/(A)| < ||/||i. □ 

Proof of Lemma 3.2. We shall give only the proof of (3.7) here; the proof of (3.6) follows by 
similar arguments, and is somewhat simpler. Recall from (3.3) the decomposition 

k-i 

^t+l,t+k,t+k t+k-m d" ^ ^l^t+k-l- 

1=0 

l^m 

Let JC := {L^/2J -F1,... ,k — l}\{m}. Since the second term on the right is independent of 

< [|a^||A|E|rjoeol AE|i7ol]]~[|t/'(-Aai)| 

l^K. 

< (Cn,|A| A l)]~[|t/)(-Aai)| 

l^JC 

using E|e'^ — 1| < |^|, (3.4) and the Cauchy-Schwarz inequality. Hence 

< (c^lAI^ A l)f[lV'(-AaO|. 
l&JC 
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Thus the left side of (3.7) may be bounded above by a constant times 

[ iz,cl\a^\P\X\P+^F(aj,X)Az2)nm-^ai)\dX. 

Jm 

The result now follows by Lemma F.2 in the Supplement to Duffy (2015). □ 

Proof of Lemma 3.3. (i) follows by arguments analogous to those used in the proof of Lemma 9.3(i) 
in Duffy (2015). For (ii), we recall from (3.2) the decomposition 


^t+k ^t,t+k 


+ X 


/ 

t+l,t+k,t+k- 


Thence by Fourier inversion (Lemma 3.1) and Lemma 3.2(i), 


l^^t/t+fc—m I 


< 


— / /(A)e ‘Kt+).E[i7f+fc_^e dX 

271 ,/m 


1 / |Ei7f+fc_^e 


using the fact that |/(A)| < ||/ H^. The result now follows by Lemma 3.2(i). □ 

Proof of Lemma 3.4. For (i), note that {df^} is regularly varying with index —2H, whence by 
Karamata’s theorem and Proposition 1.5.9a in Bingham, Goldie, and Teugels (1987), {X!t=i 
is either slowly varying (when H < 1/2), or regularly varying with index 1 — 2H. In comparison, 
is regularly varying with index 


i(l-H)>l- 2 H 

for all H e (|, 1); thus (i) holds, (ii) follows from the fact that is regularly varying 

with index 

13 13 1 

- H < -- -1 

2 2 2 2 3 

For (hi), note that {Cm} and {m^^^e^} are regularly varying with indices H — 1/a < 1 and 

1 3 17 

- + 1-H < -= - 

2 2 3 6 

respectively. Thus the result follows from Assumption 1 (iii). □ 
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A. 2 List of key notation 
Greek and Roman S 5 mibols 

Listed in (Roman) alphabetical order. Greek symbols are listed according to their English names: 


thus Q, as ‘omega’, appears before as ‘xi’. 

Oj partial sum of O; := Xi)=o . ^ 

subset of M on which the normalised ‘signal’ exceeds e . (2.15) 

A^ slight enlargement of A^ . (2.19) 

a index of domain of attraction of Cq . Ass. l(i) 

BI bounded and integrable functions on M . Sec. 1 

BIL Lipschitz functions in BI . Sec. 2.1 

c„ norming sequence . (2.7) 

C generic constant. Sec. 1 

d„ norming sequence used to define . ( 2 . 8 ) 

5„(.^) appears in Prop. 2.2 . (2.13) 

norming sequence used to define . ( 2 . 8 ) 

Cf i.i.d. sequence. Ass. l(i) 

rjf i.i.d. sequence. Ass. l(i) 

truncated version of 17 f and remainder. (5.1) 

llqll^ defined as ||i7||„ . Sec. 2.3 

Ef expectation conditional on . Sec. 3 

cr-field generated by {er}r=s . Sec. 3 

subsets of BI . Sec. 2.3 

G specific slowly varying function . (3.5) 

h, bandwidth parameter (or sequence) . Ass. 3 

h^, h lower and upper bounds defining . Ass. 3 

H sets the decay rate of as k—> 00 . Ass. l(ii) 

set of allowable bandwidths . Ass. 3 

K smoothing kernel. Ass. 3 

^ucc(Q) bounded on compacta functions on Q, with ucc topology. Sec. 1 

^oo(Q) bounded functions on Q, with uniform topology . Sec. 1 

£ local time of A . ( 2 . 6 ) 

£{ sample estimate of local time . (2.9) 

rtiQ regression function . ( 2 . 1 ) 
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rn^{A) bounds the ith derivative of mg on A c R. (2.18) 

m local level (Nadaraya-Watson) estimate of ntg . Sec. 2.1 

fhi local linear estimate of ntg . Sec. 2.1 

■^nmkf martingale components in decomposition of iSnm/. (4.2) 

Myijnf remainder from decomposition of . (4.2) 

sample space . Sec. 6 

Po chosen such t/i e . Ass. l(i) 

(py. coefficients defining the linear process Vf . Ass. l(ii) 

Tiy slowly varying sequence related to . Ass. l(ii) 

•xp characteristic function of eg . Ass. l(i) 

components in decomposition of ifi . (2.14) 

qg chosen such that EI rjg I< oo . Ass. l(i) 

rg used to define order of . Ass. 3 

truncated range of . Sec. 2.4 

norming sequence . (2.4) 

covariance Summation Operator, /(Xf)Uf“^ . (2.12), (4.2) 

Ti function x e^ — 1 . Sec. 3 

9y coefficients defining the linear process Uj . (2.3) 

Uf regression disturbance; linear process built from {pf} . (2.1), (2.3) 

analogues of Uf built from and . (5.2) 

Vf linear process built from {Cf} . (2.2) 

Xf regressor process; partial sum of {Vf} . (2.1), (2.2) 

Xj-j^ ith order statistic of {Xf}"^^; Xj-j^ < . Sec. 2.4 

x*^ J’ij^-measurable component of Xf . (3.2) 

Xj^j J'j-measurable component of Xf . (3.3) 

X finite-dimensional limit of an LFSM . (2.5) 

X„ process constructed from {Xf} . Sec. 2.2 

^mktf martingale difference components of . (4.3) 

jf dependent variable in the regression. (2.1) 

Z„ a-stable Lvy motion . (2.4) 
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S 5 mibols not connected to Greek or Roman letters 

Ordered alphabetically by their description. 

both sides have the same distribution . Rem. 2.11 

[•] ceiling function . Sec. 1 

p 

—> converges in probability to . Sec. 2.4 

finite-dimensional convergence . Sec. 1 

[•J floor function (integer part) . Sec. 1 

/ Fourier transform of / . Sec. 3 

< left side bounded by a constant times the right side . Sec. 1 

<p left side bounded in probability by the right side . Sec. 1 

11/ lip LP norm, (/|/ {p^^p, for function / . Sec. 1 

denotes sup^gR|/(x)| when p = oo 

||X||p norm, (EIXP)^/^, for random variable X . Sec. 1 

(M) martingale conditional variance . (3.8) 

[M] martingale sum of squares . (3.8) 

number of elements in the (finite) set . Prop. 2.2 

||X||.^ Orlicz norm associated to function t . Sec. 3 

f^P'^ product X >-> x^/(x) . (2.19) 

~ strong asymptotic equivalence . Sec. 1 

(ciyi ~ hj^ if lim,2_,QQ Q.^/h^ 1) 

11,^11 supremum of norm IHI over sup^gjfll/II . Sec. 2.3 

weak asymptotic equivalence . Sec. 1 

(a„ if lim^^o^ ^ (-oo> oo)\{0}) 

weak convergence (van der Vaart and Wellner, 1996) . Sec. 1 
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